Monte Carlo procedure for protein folding in lattice models. Conformational rigidity. 
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A rigourous Monte-Carlo method for protein folding simulation on lattice model is introduced. We 
show that a parameter which can be seen as the rigidity of the conformations has to be introduced 
in order to satisfy the detailed balance condition. Its properties are discussed and its role during the 
folding process is elucidated. This method is applied on small chains on two-dimensional lattice. 
A Bortz-Kalos-Lebowitz type algorithm which allows to study the kinetic of the chains at very 
low temperature is implemented in the presented method. We show that the coefficients of the 
Arrhenius law are in good agreement with the value of the main potential barrier of the system. 

PACS numbers: 05.10.Ln, 05.20.Dd, 87.15.Aa 
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Proteins are heteropolymers that exbihit surprising 
thermodynamic and kinetic properties. The first aspect 
is that the lowest free energy conformation of a protein 
is assumed to be the unique native structure and to be 
thermodynamically stable A major challenge in theo- 
retical protein folding is to understand the second aspect 
or in other words, how does a protein find its native struc- 
ture in biologically reasonnable times under physiological 
conditions 0. The lattice model is one class of models 
that is used to study theoretically the folding of protein 
|5|-|5| and Monte Carlo (MC) algorithms || are widely 
used to study dynamics P^,0,ffo|. 

In this Letter, we show that the commonly used MC 
procedure converges poorly towards thermal equilibrium. 
An attempt to refine the procedure has been recently pro- 
posed by Cieplak et al. [jl3],[l5| , but even if this procedure 
converges towards equilibrium, the parameters of the Ar- 
rhenius law that they found disagree with the value of 
the main potential barrier obtained independently by a 
study of the phase space of the systems. We introduce, 
here, a more rigorous treatment of the dynamics. Our 
method fulfil the detailed balance condition, and, then, 
converges, indeed, towards the thermal equilibrium. For 
the first time, it also shows a good efficiency in the cal- 
culation of kinetics parameters and the determination of 
the Arrhenius law. 

The model used is a two-dimensional lattice polymer. 
The chains are composed of N monomers that are con- 
nected and constrained to be on a square lattice and the 
chains are self avoiding walk. The energy of a sequence 
in a given conformation to is given by: 
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monomers interact i.e. if they are nearest neighbors on 
the lattice. The B'^s are the contact energy values. They 
are chosen randomly in a gaussian distribution centered 
on 0, and Bq is a negative parameter which favors the 



compact conformations (IjJI^l. The set of gives a 
sequence of the chain. 

The sets of connections between conformations, used 
for the MC procedure, are those used by Chan and Dill 
[Q : the corner flip and the tail moves are referred to as 
the move set 1 (MSI), the crankshaft move is referred to 
as the move set 2 (MS2) and at each MC step, a move of 
MSI is chosen with a probability r and a move of MS2 is 
chosen with a probability 1 — r Q . 

Now, the problem is to find a correct algorithm of 
Metropolis ra] which guarantees that the simulation con- 
verges towards thermal equilibrium imposed by the con- 
dition of the detailed balance : 



P^W(m -»■ n) = P { ™ ] W{n -» m) 
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where Pffl 



oc exp(— E^ m ' /T) is the equilibrium proba- 
bility of the conformation to, T is the temperature, and 
W(m — > n) is the probability of transition from the state 
to to the state n. Let us note : 
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where ( to — > n) is the a priori transition probability. 
A convenient choice for the acceptance ratio is : 
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l + exp(AE™/T) 
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with AE™ = E^-E im K Let us note N, ( n> and N£> the 
number of allowed transitions from m to any conforma- 
tion by performing a move of the MSI or of the MS2 and 
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can easily see that NmLx = N + 2 and NmLx — N — 7. 
In order to have symmetric a priori transition probabili- 
ties : W^°\m — » n ) = W(°)(n -► to), one assumes that 
the probability to attempt a move from conformation m 
to conformation n related by a connection of respectively 
MSI and MS2 during one MC step are then : 
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brackets denote the average over all the conformations. 
If a simulation checks well the detailed balance, the C(t) 
quantities should tend towards when t — > oo. 




FIG. 1. A part of the connection graph of the 12 monomers 
chain. The conformations (a), (b) and (c) are connected to 
respectively one, two and five neighbors by MSI. In the classi- 
cal MC procedure, the a priori transition probabilities are not 
symmetric. They depend on the number of neighbors. With 
the proposed method, all these transitions are attempted with 
the same a priori transition probability (not show). 



Then, the probability to attempt any move from the 
conformation m using the MSI is rN$/(N + 2) (and 
(1 - r)N$/(N - 7) using the MS2). And therefore, it 
appears a probability of null transition : 
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In contrast with rigid rotation which can involve move- 
ments of a lot of monomers, the one and two monomers 
moves are local modifications. One assumes, then, that 
they have the same affinity. Then, it comes from equa- 
tions | and | : 



N ■ 



2N -5 



(8) 



In this particular case, the previous equations simplify 
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In order to check the accuracy of the proposed pro- 
cedure, we applied it on 12 monomers chains. These 
chains can adopt 15037 different self avoiding walk con- 
formations non equivalent by symmetry. The following 
results are obtained for the sequence A defined elsewhere 
p[ . Such a short chain is used to check the method be- 
cause a convergence test can be applied to this chain in 
a reasonable computational time. 

For this chain, we performed 300 billions steps 
MC trajectories. A convergence factor C(t) — 

\J((Peq n) -occ(™)(0) 2 ) is computed each 100000 MC 
steps ; t is for number of the MC step and occ^ TO ^(t) = 
N^(t)/t where N^(t) is the number of steps corre- 
sponding to the occurences of the conformation m. The 




7 12 17 22 27 

ln(t) 

FIG. 2. Log-Log plots of the convergence factor C(t) 
versus the number of MC step t for different temperatures. 
Dashed lines : for the commonly used method for which the 
W^ \m — + n) prefactor and parameter are omitted in 
the MC procedure. Solid lines : for the proposed method. 

Figure ^| shows clearly that the commonly used pro- 
cedure present limits of convergence depending on the 
temperature. On the other hand, the proposed method 
shows a power law of the convergence factor versus the 
MC steps. This result shows very well that the factor 
Wm cannot be omitted in a lattice simulation for protein 
folding. 

In what follows, we focus on the properties of the Wr, 
factor. One must notice that this factor is only a topo- 
logical factor and therefore is sequence independent. If 
one looks now at the simulation only at the topologi- 
cal point of view, by removing for a while the energetic 
contribution (let suppose for a while that all conforma- 
tions have the same energy), one sees that, the larger 
the factor w^m , the longer the simulation stays in the m 
conformation when it reaches it. One must note that it 
is not only unprobable to escape from the conformation 
to if Wm is large, but it is also unprobable to reach it. 
On the contrary, conformations with small values of Wm 
are often reach but the simulation doesn't stay in this 
conformation. Then, the larger win , the more rigid the 
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conformation to and the smaller 
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the more flexible 



the conformation 

„(°) 



Therefore, let us call in what fol- 
lows Wm' the rigidity of the conformation m. Fig. ||(b) 
show how the Wm prefactors are distributed for each sub- 
set of conformations with the same number of contacts. 
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No conformation have a value of Win equal to 1 (fig. ||) . 
This guarantees that, no conformation is totally rigid, 
then each one is related to at least, another one. But, 
one must note that this condiction is not enough strong 
to fulfil the ergodic hypothesis. 




FIG. 3. Distribution of the number of conformations as 
a function of the rigidity Wm and the number of intrachain 
contacts N c divided by the number of conformations which 
have N c contacts. One must note that the smaller the number 
of contacts of the conformations of a subset the larger the 
normalisation factor. 

It appears clearly that, the more compact conforma- 
tions present the larger values of the rigidity. The more 
flexible are the more extended. Only one move of MSI is 
allowed for the two more rigid conformations. One can 
see that, there is no conformation which have a value of 
Km which tends towards 0. Hence no conformation is 
totally flexible. This is a consequence of that no confor- 
mation present the maximum number of neighbors with 
both the MSI and the MS2. 

The native conformations of protein has not only very 
low energy but also they are very compact. Hence, they 
have of large Boltzmann weights but also they are very 
rigid conformations. Both effects favorise the stability 
of the native conformations, but the folding dynamics is 
slowed down by the topology of the native structures. 
The trap conformations are also very compact and are 
conformation of local energy minima @. Then to exit 
the trap valley the chain has to first escape from a stable 
and rigid conformation. 

One computed many kinetic ways from the trap con- 
formation to the native structure of the sequence A. The 
trap conformation has been determined by solving the 
master equation of the system using the way described 
by Cieplak et al [jllj for the particular choice of r used in 
the present paper. The conformation trap found here is 
the same that the conformation found by Cieplak et al. 
and it is chosen as the first conformation of the MC tra- 
jectories. The kinetic ways exhibit all similar properties. 
The native and the trap conformations are compact and 
then very rigid (w[^ tif — w trap = 0.894) and have low 
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FIG. 4. Energy (top) and rigidity (bottom) versus the 
MC steps of a typical trajectory of folding simulation for the 
sequence A at T = 0.4. 

At low temperature the system spends a lot of MC 
steps in the trap conformation. The system escapes un- 
easily from the trap by passing through transition states 
which exhibit common properties : high energies, few 
intrachain contacts and then great flexibility. Therefore, 
even if the transition states are energetically unfavorable, 
they are easily accessible at a topological point of view 
and the MC trajectories spend a very few steps in these 
conformations. 

A major problem of the protein folding investigation is 
namely to calculate kinetic properties at low temperature 
jl7],[l0|], where the rejected move ratio of a MC procedure 
is very large. The efficiency of the procedure is increased 
at low temperature using a Bortz-Kalos-Lebowitz (BKL) 
type algorithm [|12j. The idea is the following : let us 
note, w m the probability not to accept a move from the 
conformation m during one step : 

1 1 

Wm =1 > t-. ; — r (ff) 

2iV-5 ^ 1 + exp (AE™T) v ' 

then, the probability not to accept a move from the con- 
formation m during exactly k steps is : 

P(k)=w^ 1 (l-w m ) (12) 

Then for each move, the number of MC steps k, dur- 
ing which the chain stays in the current conformation, 
say m, is chosen at random in the density of probability 
P(k) and a move chosen with the following probability 
of transition : 
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l+cxp(A.E™/T) 



(13) 



l+cxp(A_E m ,/T) 



is always performed. This procedure permits to carry out 
MC simulations at very low temperature. All the values 
of w m and t(m — > n) are computed for each temperature 
before performing the MC trajectories. 

The folding times (t/oid) have been computed using 
the BKL type algorithm for low temperature. The fold- 
ing time is the average over 500 trajectories of the num- 
ber of MC steps needed to reach the conformation of 
lowest energy. Three different simulations have been car- 
ried out depending on the choice of the first conforma- 
tion set ; the simulation "T" for which the trap con- 
formation is chosen as the first conformation ; the sim- 
ulation "E" for which the first conformation is an ex- 
tended conformation chosen at random ; the simulation 
" R" for which the first conformation is chosen at random 
among all the conformational space. The transition state 
of lowest energy between the trap and the native struc- 
ture has been determined elsewhere for this sequence fl5| l 
and the difference of energy between the trap and the 
transition state had been computed and equals AE — 
4.53. The Monte-Carlo folding time found by Cieplak et 
al. follows an Arrhenius law tf id(T) = Aexp(SE /T), 
with SE — 2.76 which is in poor agreement with AE. 
For the three simulations, we also find Arrhenius laws 
at very low temperature (T = 0.24, 0.22, 0.20, 0.18). 





SE 


A 


simulation "T" 


4.51 


33.25 


simulation "E" 


4.40 


8.58 


simulation "R" 


4.34 


12.55 



TABLE I. Value of the parameters SE and A of the Arrhe- 
nius laws t f oi d (T) = Aexp(SE/T) for the "T", "E" and "R" 
simulations (see text) 

The results of 5E, shown in table 1, are in very good 
agreement (less than 1% for the "T" simulation) with the 
value of AE and strongly support the proposed method 
for the calculation of the parameters of the Arrhenius 
laws. 

If a first conformation is chosen at random, it can fall 
in the trap valley (TV) , in the native conformation valley 
(NV) or in less important valleys. At low temperature, 
whatever the set of first conformations, the conforma- 
tions which fall in TV govern the kinetics. Then, the 
dominant term in the exponential function of the Arrhe- 
nius law tends always towards AE. The ratio of the A 
coefficients gives the proportion of conformations which 
falls in TV : a random conformation has a probability 
equal to 12.55/33.25 = 0.38 to fall in TV and an ex- 
tended conformation has a probability equal to 0.26 to 
be attracted by TV. Then, these ratios give an insight of 
the attraction strenght of the basin of TV. 



The results presented in this Letter show clearly that 
the proposed MC method is well adapted to the study of 
the dynamics of protein folding. It has been showed that 
not only the difference of energy between the conforma- 
tions has to be taken into account in the MC simulations 
but also the rigidity of the conformations. The method 
had been applied only on a short chain in order to check 
its efficiency, but it is easily applicable to longer chain 
on two- or three-dimensional lattices and moreover the 
BKL algorithm should permit to elucidate low tempera- 
ture properties of protein- like chains. 
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